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Scale  changes  everything 


Volume 

Data  Size 


Complexity 
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WHAT  IS  BIG  DATA? 


FROM  A  SOFTWARE  ARCHITECTURE 
PERSPECTIVE  ... 
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Some  Big  Data 


Google: 

•  Gmail  alone  is  in  the  exabyte  range 

Salesforce.conn 

•  Handles  1 .3  billion  transactions  per  day 

Pinterest.com 

•  0  to  10s  of  billions  of  page  views  a  month  in  two 
years, 

•  from  2  founders  and  one  engineer  to  over  40 
engineers, 

•  from  one  MySQL  server  to  180  Web  Engines, 

240  API  Engines,  88  MySQL  DBs  +  1  slave  each, 
110  Redis  Instances,  and  200  Memcache 
Instances. 


■^,1^0t010100^0ou,  .  .  . 
10010010100^000-,  . 

01  oc.  •  1 1X  ;n : 

s:iii(Di(>l10lWTOTDOlbOOOiO(XK)i  « 
2^0101 01 01 01 01 001 00 1 0  •  -  ’ 
^JJ’OOIOIOOIOOIOIOOIOIOIOIOCJ 

J,  r^''^‘‘Ql0l010l00100lCf 

"  -.  '100  !  l  “V 

^  0 1  ‘  ;  0O10  ’  ‘  ‘ 


http://highscalability.eom/blog/2014/2/3/how-google-backs-up-the-internet-along-with-exabytes-of-othe.html 
http://highscalability.eom/biog/2013/9/23/saiesforce-architecture-how-they-handie-13-biliion-transacti.html 
http://highscalability.eom/biog/2013/4/15/scaling-pinterest-from-0-to-1  Os-of-billions-of-page-views-a.html 
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Not  so  successful 


Some  first-wave  big  data  projects  'written  down' 
says  Deloitte 


Not  enough  data  a  problem  for  some,  while  Hadoop  integration  has 
proved  tricky 

By  Simon  Sharwood,  19  Feb  2014  »  Follow  ^  3.278  to  lowers  | 


Transforming  your  business  with  flash  storage 

Consultancy  outfit  Deloitte  reckons  early  big  data  projects  have  had  to  be  written 
down  because  they  failed,  thanks  in  part  to  a  “buy  it  and  the  benefits  will  come" 
mentality. 

The  source  of  failure  was  sometimes  difficulty  making  open  source  software  work 
and/or  integrate  with  other  systems.  Deloitte  Australia's  technology  consulting 
partner  Tim  Nugent  told  The  Reg.  Such  failures  weren't  because  the  software  was  of 
poor  quality.  Instead,  organisations  weren't  able  to  make  it  do  meaningful  work 
because  they  lacked  the  skills  to  do  so.  Integrating  big  data  tools  with  other  systems 
also  proved  difficult. 

The  attempt  to  develop  those  skills 
while  also  staying  abreast  of  the 
many  changes  in  the  field  of  big 
data  proved  hard  for  some, 

Nugent  said.  Happily,  vendors  and 
services  providers  have  since 
come  up  to  speed  and  are  making 


Why  Most  Big  Data  Projects  Fail  + 
How  to  Make  Yours  Succeed 

By  Darin  Bartik  |  May  14.  2013  V  Follow  85  followers] 

CXM  Webinar:  Deliver  contextually  relevant  experiences  across  any  channel, 
device  or  language 
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Big  data  is  on  the  minds  of  just  about 
everyone,  with  IT  departments  large 
and  small  grappling  with  exponentially 
growing  volumes  of  both  structured 
and  unstructured  data.  But  despite  big 
data’s  place  as  a  mainstream  IT 
phenomenon,  the  bulk  of  big  data 
projects  still  fail,  as  organizations 
struggle  to  find  ways  to  capture, 

manage,  make  sense  of  and  ultimately,  derive  value  from  their  data  and 
information. 


•  Lack  of  knowledge.  Many  of  the  technologies,  approaches  and  disciplines 
around  big  data  are  new,  so  people  lack  the  knowledge  about  how  to 
actually  work  with  the  data  and  accomplish  a  business  result. 


Software  Engineering  Institute 


Carnegie  Mellon  Univei’sift' 


Software  Architecture: 

Trends  and  New  Directions 
#SEIswArch 

©  2014  Carnegie  Mellon  University 


Big  Data  Survey 


@//  OF  BIG  DATA  PROJECTS 

'  /®  Are  not  completed 
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WHEN  IT  COMES  TO  BIG  DATA  PROJE 
THE  MOST  SIGNIFICANT  CHALLENGE  FAI 


INACCURATE  SCOPE 
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FINDING  THE 
RIGHT  TOOLS 


FINDING  TALENT 
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TOP  REQUIREMENTS  OF 

BIG  DATA 

SOLUTIONS 
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MANAGEMENT 

ABIUTY 
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Big  Data  -  State  of  the  practice 
“The  probiem  is  not  soived” 

Building  scalable,  assured  big  data  systems  is  hard 

•  Healthcare.gov 

•  Netflix  -  Christmas  Eve  2012  outage 

•  Amazon  -  19  Aug  2013  -  45  minutes  of  downtime  =  $5M  lost  revenue 

•  Google  -  16  Aug  2013  -  homepage  offline  for  5  minutes 

•  NASDAQ  -  June  2012  -  Facebook  IPO 

Building  scalable,  assured  big  data  systems  is  expensive 

•  Google,  Amazon,  Facebook,  et  al. 

-  More  than  a  decade  of  investment 

-  Billions  of  $$$ 

•  Many  application-specific  solutions  that  exploit  problem-specific  properties 

-  No  such  thing  as  a  general-purpose  scalable  system 

•  Cloud  computing  lowers  cost  barrier  to  entry  -  now  possible  to  fail  cheaper 
and  faster 
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NoSQL  -  Horizontally-scalable  database 
technology 

Designed  to  scale  horizontally  and  provide 
high  performance  for  a  particular  type  of 
problem 

•  Most  originated  to  solve  a  particular  syster 
problem/use  case 


•  Later  were  generalized  (somewhat)  and 
many  are  available  as  open-source 
packages 


Large  variety  of: 

•  Data  models 

•  Query  languages 

•  Scalability  mechanisms 

•  Consistency  models,  e.g. 

-  Strong 

-  Eventual 


NoSQL  Landscape 
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Horizontal  Scaling  Distributes  Data 

(and  adds  complexity) 


Distributed  systems  theory  is  hard  but  well-established 

•  Lamport’s  “Time,  clocks  and  ordering  of  events”  (1978), 
“Byzantine  generals”  (1982),  and  “Part-time  parliament”  (1990) 

•  Gray’s  “Notes  on  database  operating  systems”  (1978) 

•  Lynch’s  “Distributed  algorithms”  (1996,  906  pages) 

Implementing  the  theory  is  hard,  but  possible 

•  Google’s  “Paxos  made  live”  (2007) 

Introduces  fundamental  tradeoff  among  “CAP”  qualities 

•  Consistency,  Availability,  Partition  tolerance  (see  Brewer) 

•  “When  Partition  occurs,  tradeoff  Availability  against  Consistenc 
Else  tradeoff  Latency  against  Consistency”  (PACELC,  see  Abadi) 

“A  distributed  system  is  one  in  which  the  faiiure  of  a  computer 
you  didn’t  even  know  existed  can  render  your  own  computer 
unusabie” 
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Rule  of  Thumb: 

Scalability  reduces  as  Implementation 
complexity  grows 


Workload 


Scalability 


•  #  of  concurrent  sessions  and  operations 

•  Operation  mix  (create,  read,  update,  delete) 

•  Generally,  each  system  use  case  represents  a 
distinct  and  varying  workload 

Data  Sets 

•  Number  of  records 

•  Record  size 

•  Record  structure  (e.g.,  sparse  records) 

•  Homogeneity/heterogeneity  of  structure/schema 

•  Consistency 


Simple  queries 
Eventual  Consistency 

Strong 
Consistency 

Machine 
Learning 


Complexity 
of  Solution 
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Big  Data  - 

A  complex  software  engineering  problem 


Big  data  technologies  implement  data 
models  and  mechanisms  that: 

•  Can  deliver  high  performance,  availability  and 
scalability 

•  Don’t  deliver  a  free  lunch 

-  Consistency 

-  Distribution 

-  Performance 

-  Scalability 

-  Availability 

-  System  management 

•  Major  differences  between  big  data  models/ 
technologies  introduce  complexity 
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Software  Engineering  at  Scaie 


Key  Concept: 

•  system  capacity  must  scale  faster  than 
cost/effort 

-  Adopt  approaches  so  that  capacity 
scales  faster  than  the  effort  needed  to 
support  that  capacity. 

-  Scalable  systems  at  predictable  costs 
Approaches: 

•  Scalable  software  architectures 

•  Scalable  software  technologies 

•  Scalable  execution  platforms 
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so  WHAT  ARE  WE  DOING 
AT  THE  SEI? 
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Enhancing  Design  Knowledge  for  Big  Data 
Systems 


Design  knowledge  repository 
for  big  data  systems 

•  Navigate 

•  Search 

•  Extend 

•  Capture  Trade-offs 

Technology  selection  method 
for  big  data  systems 

•  Comparison 

•  Evaluation  Criteria 

•  Benchmarking 


Design 

Expertise 
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LEAP4BD 

Lightweight  Evaluation  and  Architecture 
Prototyping  for  Big  Data  (LEAP4BD) 

Aims 

•  Risk  reduction 

•  Rapid,  streamlined  selection/acquisition 
Steps 

1 .  Assess  the  system  context  and  landscape 


2.  Identify  the  architecturally-significant  requirements 
and  decision  criteria 

3.  Evaluate  candidate  technologies  against  quality 
attribute  decision  criteria 

4.  Validate  architecture  decisions  and  technology 
selections  through  focused  prototyping 


\f 


Prototyping 
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Some  Example  Scalability  Prototypes  -  Cassandra 


Overall  Throughput 
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QuABase  - 

A  Knowledge  Base  for  Big  Data  System  Design 


Wikipedia 


English 

The  Free  Encyclopedia 

4  4S3  000+  articles 
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Portugues 
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Semantics-based  Knowledge  Model 


General  model  of  software  architecture 
knowledge 

Populated  with  specific  big  data 
architecture  knowledge 


Dynamic,  generated,  and  queryable  content 


Knowledge  Visualization 


English 


Software  Engineering  Institute 


("aiTieji^ie  Mellon  Lnivei’sit)' 


Software  Architecture: 

Trends  and  New  Directions 
#SEIswArch 

©  2014  Carnegie  Mellon  University 


QuABase  Demo 
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QuABase  Demo 


1 


O  O  QuASIpedia  -  Architecture  Quality  At  Scale 

I  ◄  >  I  '  I  j  I  +  I  0  10.128.2.62 /mediawiki/index.php?title=Main_Page  C[  ^  ^  I  [  fi  |  |  O  i  |  O  I  i  ^  i 

PQ  ::::  +  Add  to  Delicious  My  Delicious 


Editing  Main  Page 

Risk  Consistency  Features  >  Consistency  >  Ensure  read.-'wrlte  qucxums  >  Risk  >  Main  Page 


Wikitext 


Preview 


Changes 


B  I  IS®  SQl  ▼  Advanced  ►  Special  characters  ►  Help 


Heading  ▼ 


Format 


4*  A*  A“  A*  A-r  Insert  ^  O 


“  Quality  Attributes  == 

{{#as}c:  [  [Category  :Quality  attribute]]  |  f  ormat=ul} } 

“Database  Technologies  == 

{{#aslc:  [  [Category : Database]  ] 


I intro=Select  any  of  the  database  below  to  get  information  on  their  features  and  the  tactics  they  support 


I f ormat=ul 
}} 


What  links  here 
Related  changes 
Upload  file 
Special  pages 
Printable  version 
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QuABase  Demo 
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Description 


Road  Edit  with  form  Edit  View  history  _ X  Co  ]  [  Search  ] 


[edit] 


Navigation  Consistency  issues  in  distributed  systems  stem  from  replicatbn  and  the  spatial  separatbn  of  data  objects,  and  when  two  or  more  objects  must  be  updated  together  to  maintain  bgbal  consistency.  Both  these  issues  occur 

Mam  page  commonly  in  big  data  systems,  and  hence  consistency  is  a  fundamental  quality  attribute  for  big  data  systems. 

Recent  changes 
Random  page 

Hep  General  Scenario  for  Consistency _ [edit] 


Contrbute 

Add  a  new  quality 
attrcute 

Add  a  new  quality 
attrbute  scenario 
Add  a  new  tactc 

Too  box 

What  Hnks  here 
Related  changes 
Up  bad  tile 
Special  pages 
Pnntable  versbn 
Permanent  imk 
Page  intormaton 
Browse  propertes 


Stimulus 

A  write  to  singb  data  object  is  issued  (OR) 

A  sbgte  writer  updates  two  or  more  objects  to  mabtam  consistency  between  them  (OR) 

Two  writers  attempt  to  update  the  same  object  simultaneously 

Erivironmsnt 

Distributed  database  with  repibatbn  (OR) 

Non-distributed  and  non-repibated  database  (OR) 

Cached  database  access 

Response 

Read-after-write  consistency;  after  a  write  operation  on  data  object  X  the  new  value  w«  always  be  seen  by  readers  of  X  at  some  time  b  the  future 

Updates  to  two  or  more  data  objects  by  a  singb  writer  result  b  consistent  vabes  across  the  objects  through  either  successful  updates  or  an  error  that  rolls  back  object  values  to  their  previous  state 

Response  Measure 

Time  for  all  object  replicas  to  store  same  value  after  write  succeeds 

Multipb  objects  updated  successfully  together  or  an  error  is  issued  and  they  are  returned  to  their  prevbus  state 

Quality  Attribute  Scenarios  and  Tactics  for  Consistency 


Quality  Attribute  Scenario  ^ 

Tactics  ^ 

Ensure  eventual  consistency  b  a  replicated,  distributed  database 

Asynchrorx>us  replica  update 
Hinted  handoffs 

Ensure  eventual  consistency  when  making  multipb  object  updates 

Distributed  transactbns 

ConfHct  resobtbn 

Ensure  strong  consistency  for  a  write- write  confKct 

Confibt  resobtbn 

Ensure  read/write  quorurr^ 
Queued  Writes 

Ensure  strong  consistency  b  a  replicated,  distributed  database  for  a  singb  object  update 

Ensure  read/write  quorunr^s 

Read  from  master  only 

Write  to  all  repibas 

Ensure  strong  consistency  b  a  replicated,  distributed  database  for  multipb  object  updates 

Distributed  transactbns 

Denormalized  data  nxxlel 

Ensure  strona  consistenev  in  an  unreolcated.  non^istrtMJted  database  tor  muMDle  obiect  uodates 

Dertormalizatbn  (Nested  records) 

[edit] 
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Ensure  read/write  quorums 

Riak  Consistency  Features  >  Riak  >  Riak  Consistency  Features  >  Consistency  >  Ensure  read/write  quorums 

Description  [edit] 

Assuming  there  are  N  replicas  of  any  object,  a  writer  may  specify  that  a  quorum  [5^  of  the  replicas  must  be  updated  before  the  write  succeeds.  This  ensures  that  a  majority  of  the  replicas  are 
updated  before  the  write  completes.  If  all  writers  perform  quorum  writes,  this  also  prevents  write-write  conflicts  as  only  one  writer  can  ever  achieve  quorum  at  any  instant. 

To  ensure  all  readers  see  the  updated  value  after  any  write  completes,  readers  must  also  specify  that  a  quorum  of  object  values  must  be  the  same  before  the  read  succeeds.  This  ensures  that  a 
reader  cannot  see  a  value  at  a  replica  that  has  not  yet  been  updated  with  the  new  value. 

In  either  case,  if  a  quorum  of  replica  objects  cannot  be  written  to  or  read  from,  the  operation  fails. 

The  general  form  or  the  requests  to  achieve  strong  consistency  are:  Qr  +  Qw  >  N  Qw  >  N/2 

A  number  of  NoSQL  databases  provide  quorum  mechanisms  for  readers  and  writers  to  be  able  to  tune  consistency.  This  is  typically  specified  on  a  per-write  call  to  enable  each  write  to  be  tuned 
accordingly. 


Improves  Quality 

Consistency 

Reduces  Quality 

Performance.  Availability 

Related  Tactics 

Hinted  handoffs 

Implementations 


[edit] 


This  tactic  is  supported  by  the  feature  Tunable  consistency  of  the  product  Cassandra. 

This  tactic  is  supported  by  the  feature  Tunable  consistency  of  the  product  MongoDB. 

This  tactic  is  supported  by  the  feature  Tunable  consistency  of  the  product  Riak. 
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Edit  Tactic;  Ensure  read/write  quorums 


E'i-'e  >  C>Z'''s  s’.i''cy  >  E''5-'’€  c^o'unia  >  RxrtiTaciiC  >  E''5-'6 


Description  (Required) 


Navigation 

Mam  page 
Recent  changes 
Random  page 
Hep 

ContrDute 

Add  a  new  quality 
attrbute 

Add  a  new  quality 
attrfiute  soenano 
Add  a  new  tactc 

Too  Pox 

wnat  links  here 
Related  changes 
Up  Pad  tie 
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Assuming  there  are  N  replicas  of  any  objea,  a  writer  may  specify  that  a 

Ihnp://en.wikipedia.orgywikj/Quorum_9U8distributed_computing9i29  quorum]  of  the  replicas  must  be  updated  before  the 
write  succeeds.  This  ensures  that  a  majority  of  the  replicas  are  updated  before  the  write  completes.  If  all  writers  perform 
quorum  writes,  this  also  prevents  wnte-write  conflicts  as  only  one  writer  can  ever  achieve  quorum  at  any  insunt. 

To  ensure  all  readers  see  the  updated  value  after  any  write  completes,  readers  must  also  specify  that  a  quorum  of  object 
values  must  be  the  same  before  the  read  succeeds.  This  ensures  that  a  reader  cannot  see  a  value  at  a  replica  that  has  not  yet 
been  updated  with  the  new  value. 

In  either  case,  if  a  quorum  of  replica  objects  cannot  be  written  to  or  read  from,  the  operation  fails. 

The  general  form  or  the  requests  to  achieve  strong  consistency  are; 

Or  -f  Qw  >  N 
Qw  >  N/2 

A  number  of  NoSQL  databases  provide  quorum  mechanisms  for  readers  and  writers  to  be  able  to  tune  consistency.  This  is 
typically  specified  on  a  per-write  call  to  enable  each  write  to  be  tuned  accordingly. 


Improves  QA: 
Reduces  QA: 
Related  Tactics: 


Consistency 

Performance.  Availability 
Hinted  handoffs. 


Product:  | Cassandra  | 

Feature:  {Tunable  consistency  | 

Feature  Reference  Link:  |http;//www.datastax.com/documei[ 

<t+ 

n 

Product:  {viongoDB  | 

Feature:  {Tunable  consistency  ( 

<t+ 

n 

Feature  Reference  Link:  j  [ 

[  Add  another  ] 
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Ensure  read/write  quorums 

Risk  Consistent  Features  >  Risk  >  Risk  Consistent  Features  >  Consistent  >  Ensure  read/write  quorums 

Description  [edit] 

Assuming  there  are  N  replicas  of  any  object,  a  writer  may  specify  that  a  quorum  [S’  of  the  replicas  must  be  updated  before  the  write  succeeds.  This  ensures  that  a  majority  of  the  replicas  are 
updated  before  the  write  completes.  If  all  writers  perform  quorum  writes,  this  also  prevents  write-write  conflicts  as  only  one  writer  can  ever  achieve  quorum  at  any  instant. 

To  ensure  all  readers  see  the  updated  value  after  any  write  completes,  readers  must  also  specify  that  a  quorum  of  object  values  must  be  the  same  before  the  read  succeeds.  This  ensures  that  a 
reader  cannot  see  a  value  at  a  replica  that  has  not  yet  been  updated  with  the  new  value. 

In  either  case,  if  a  quorum  of  replica  objects  cannot  be  written  to  or  read  from,  the  operation  fails. 

The  general  form  or  the  requests  to  achieve  strong  consistency  are:  Qr  +  Qw  >  N  Qw  >  N/2 

A  number  of  NoSQL  databases  provide  quorum  mechanisms  for  readers  and  writers  to  be  able  to  tune  consistency.  This  is  typically  specified  on  a  per-write  call  to  enable  each  write  to  be  tuned 
accordingly. 


Improves  Quality 

Consistency 

Reduces  Quality 

Performance.  Availability 

Related  Tactics 

Hinted  handoffs 

Implementations 


[edit] 


This  tactic  is  supported  by  the  feature  Tunable  consistency  of  the  product  Cassandra. 

This  tactic  is  supported  by  the  feature  Tunable  consistency  of  the  product  MongoDB. 

This  tactic  is  supported  by  the  feature  Tunable  consistency  of  the  product  Riak. 
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Tactics  Supported  by  Riak 
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Riak  Consistency  Features 

Ri3<  >  Ri3<  Consistency  Festures  >  Ris*;  >  Ris'c  Consistency  Feetures 


Database 


Riak 


Object-Level  isolation  on  updates 

supported 

ACID  transactions  in  single  database 

not  supported 

Distributed  ACID  transactions 

not  supported 

Specify  Quorum  ReadsAA/rites 

in  client 

LEAP4BD 

Evaluation 

Features 


Specify  number  of  replicas  to  write  to 

in  client 

Behaviour  when  write  cannot  complete  on  specified  number  of  replicas 

no  rollback:  write  returns  replication  error 

Writes  configured  to  never  fail 

supported 

Specify  number  of  replicas  to  read  from 

in  client 

Read  from  replica  master  only 

not  supported 

Updates  applied  to  transaction  log  before  returning  from  write 

supported 

Object  level  timestamps  to  detect  conflicts 

supported 

Efficient  protocol  to  rapidly  propagate  updates  across  replicas  (minimize  inconsistency  window) 

by  default 

add  explanations  here 


Categories:  Consistency  Features  Strong  Consistency  ’  Eventual  Consistency 
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Status 


LEAP4BD 

•  Initial  trial  with  DoD  client  near  completion 

•  Rolling  out  as  an  SEI  service 
QuABase 

•  Design/development  in  progress 

•  Validation/testing  over  summer 

Software  Engineering  for  Big  Data  Course  (1  day)  and  tutorial  (1/2  day) 

•  SATURN  2014  in  Portland,  May  2014 

•  http  ://www.  se  i .  cm  u .  ed  u/satu  rn/20 1 4/cou  rses/ 

•  WICSA  in  Sydney,  Australia  April  2014 

•  Both  available  on  request 
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Exponential  data  growth  from  the  Internet,  low  cost  sensors,  and  high  fidelity  instruments  has  fueled  the 
development  of  advanced  analytics  operating  on  vast  data  repositories.  These  analytics  bring  business 
benefits  ranging  from  web  content  personalization  to  predictive  maintenance  of  aircraft  components.  To 
construct  the  data  repositories  that  underpin  these  systems,  there  has  been  rapid  innovation  in 
distributed  data  management  technologies,  employing  schema-less  data  models  and  relaxing 
consistency  guarantees  to  satisfy  scalability  and  availability  requirements.  This  paper  describes  the 
challenges  of  these  "big  data”  systems  that  confront  software  architects.  We  show  how  distributed 
software  architecture  quality  attributes  are  tightly  linked  to  the  both  the  data  and  deployment 
architectures.  This  causes  a  consolidation  of  concerns,  and  designs  must  be  closely  harmonized 
across  these  three  structures  to  satisfy  quality  requirements. 
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